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ABSTRACT 

The laboratory mouse is the premier animal model 
for studying human biology because all life stages 
can be accessed experimentally, a completely 
sequenced reference genome is publicly available 
and there exists a myriad of genomic tools for com- 
parative and experimental research. In the current 
era of genome scale, data-driven biomedical 
research, the integration of genetic, genomic and 
biological data are essential for realizing the full po- 
tential of the mouse as an experimental model. The 
Mouse Genome Database (MGD; http://www.in 
formatics.jax.org), the community model organism 
database for the laboratory mouse, is designed to 
facilitate the use of the laboratory mouse as a 
model system for understanding human biology 
and disease. To achieve this goal, MGD integrates 
genetic and genomic data related to the functional 
and phenotypic characterization of mouse genes 
and alleles and serves as a comprehensive catalog 
for mouse models of human disease. Recent en- 
hancements to MGD include the addition of human 
ortholog details to mouse Gene Detail pages, the in- 
clusion of microRNA knockouts to MGD's catalog of 
alleles and phenotypes, the addition of video clips to 
phenotype images, providing access to genotype 
and phenotype data associated with quantitative 
trait loci (QTL) and improvements to the layout and 
display of Gene Ontology annotations. 

INTRODUCTION 

The laboratory mouse is widely recognized as the premier 
animal model for investigating genetic and cellular 
systems relevant to human biology and disease. A large 
arsenal of experimental genetic tools is available for 
mouse, including unique inbred strains, a complete refer- 
ence genome (and deep-sequencing data for 17 additional 



inbred lines), extensive genome variation maps (e.g. Single 
Nucleotide Polymorphisms) and technologies for directly 
and specifically manipulating the mouse genome. An 
international effort to knockout all mouse genes has 
produced an ES cell line resource covering over 18 000 
genes (1) and the phenotyping phase has begun (2). New 
resources for complex trait mapping including the 
Collaborative Cross and Diversity Outbred mice are be- 
ginning to emerge (3,4). In the arena of human genetics 
and genomics, exome sequencing and the quest for lower 
and lower cost genome sequences will change again the 
way we approach computational and experimental 
methods for understanding the biology of the genome. 
The mouse is essential for the functional analysis and an- 
notation of rapidly emerging human genomes through 
comparative genomics. 

Realizing the full power of the mouse as a model of 
human biology depends, in part, on integrating the 
diverse genetic, genomic and phenotypic data for the 
mouse in ways that promote experimental and transla- 
tional research. The central objective of the Mouse 
Genome Database (MGD) is to provide an integrative 
and comparative bioinformatics resource that supports 
the effective translation of information from experimental 
mouse models to uncover the genetic basis of human 
diseases. MGD is the highly curated, community model 
organism database for the laboratory mouse providing 
web and programmatic access to a complete catalog of 
mouse genes and genome features integrated with func- 
tional annotations, a comprehensive catalog of mutant 
alleles, phenotype annotations, human disease model an- 
notations, variation data and sequence data. MGD went 
online via the World Wide Web in 1994, unifying and 
harmonizing several different databases of genetic map 
and allele information for the laboratory mouse. MGD 
has evolved rapidly, re-tooling and enhancing the 
database to adapt to the multitude of new data types, de- 
veloping and upgrading data access tools for an increas- 
ingly diverse community of researchers, and adopting new 
database and software technologies as they have emerged 
and matured. 
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MGD is the central component of a number of 
coordinated genome informatics projects that are part of 
the Mouse Genome Informatics (MGI) consortium 
(http://www.informatics.jax.org). Other database re- 
sources available through the MGI web portal include 
the Gene Expression Database (GXD) (5), the Mouse 
Tumor Biology Database (6), the Gene Ontology (GO) 
project (7) and the MouseCyc database of biochemical 
pathways (8). Taken together, these resources provide a 
combination of data breadth, depth, integration and 
quality that exists nowhere else for mouse. 

IMPROVEMENTS 

The curation efforts within MGD focus on maintaining a 
catalog of genes and other genome features, functional 
annotation of mouse genes using Gene Ontology terms, 
annotation of phenotypes associated with genotypes using 
terms from the Mammalian Phenotype Ontology and the 
association of mouse models with human disease. Data 
release for MGD occur weekly. A summary of the 
database content for MGD is given in Table 1. 

Enhanced human ortholog detail display 

A banner displaying information about the human 
ortholog of each mouse gene was added to the Gene 
Detail pages in MGD to improve comparisons of gene- 
disease associations in mouse and human. The human 
ortholog detail stripe is positioned above the section of 
the Gene Detail page that describes alleles and phenotypes 
for the mouse gene (Figure 1). For each human ortholog, 
the name and location of the human gene is provided and, 
if relevant, a list of associated diseases according to the On 
Line Mendelian Inheritance in Man (OMIM) resource (9) 
is displayed. The combination of the human ortholog and 
alleles/phenotypes sections of the Gene Detail page facili- 
tates the ability of the researchers to determine cases 
where the human gene is associated with a disease and 
the mouse gene is not (or has yet to be specifically tested 
as a model) (Figure 1). 

By providing information on concordant and discord- 
ant instances of mutations in orthologous genes resulting 

Table 1. Summary of MGD content September 2012 



Database content category September 

2012 



Number of genes and genome features 37 501 

Number of genes with nucleotide sequence data 29127 

Number of genes with protein sequence data 24738 

Number of mouse genes with human orthologs 1 7 773 

Number of mouse genes with rat orthologs 17253 

Number of genes with Gene Ontology (GO) annotations 25 452 

Total number of GO annotations 240 562 

Number of mutant alleles 712 279 

Targeted mutations 47 649 

Number of QTL 4715 

Number of genotypes with phenotype annotation 44775 

Total number of Mammalian Phenotype (MP) annotations 234495 

Number of mouse models associated to human diseases 3829 

Number of references in the MGD bibliography 181223 



in phenotypes that model-specific human diseases MGD 
can be used to discover potential candidate genes for 
human diseases that have no gene associations in 
human; and to discover mutations in mice that should 
be examined as new models of human disease. For 
example, the spermatogenesis associated 16 (Spatal6; 
MGI: 1918112) gene is the mouse ortholog for the 
human SPATA16 (HGNC: 29 935) gene. In humans, mu- 
tations in this gene are associated with Apermatogenic 
Failure 6 (SPGF6) (OMIM 102 530). In mouse, there are 
currently three alleles for the Spatal6 gene; however, all of 
these mutants exist only in ES cell lines, thus representing 
potential sources of mouse models for this disease once the 
ES cells are made into mice and phenotyped. Conversely, 
one can observe where a mouse disease model has been 
associated with a human disease, but there is not yet 
evidence for the human-disease association to the 
human ortholog. For example, the mouse cholinergic 
receptor, muscarinic 3, cardiac (Chrm3, MGI: 88398) 
gene are a model for human Megacystis-Microcolon- 
Intestinal Hypoperistalsis Syndrome (OMIM 249 210). 
Thus, study of existing mouse models can facilitate dis- 
covery of candidates for disease genes in human. 

In some cases, alleles of the mouse gene are associated 
with human disease phenotypes that differ from associ- 
ations reported in OMIM. For example, for the mouse 
caveolin 1 gene (Cavl; MGI: 102 709), the human 
ortholog (CAV1) is associated with congenital 
lipodystrophy (OMIM: 612 526). However, the genotypes 
in mouse are associated with human breast cancer 
(OMIM: 114480) and Alzheimer's disease (OMIM: 
104 300) but not with lipodystrophy. The bicaudal C 
homolog 1 (Biccl; MGI: 1 933 388) gene in mouse is 
reported as a model for three human diseases in OMIM 
[Heterotaxy (HTX5), OMIM: 270 100; Polycystic Kidney 
Disease 1 (PKD1), OMIM: 173 900; and PKD, ARPKD, 
OMIM: 263 200). In contrast, the human ortholog, BICC1 
(HGNC: 19 351), is not associated with any disease ac- 
cording to OMIM. 

microRNA knockouts 

In recent years, the importance of small regulatory RNAs, 
including microRNAs, in posttranscriptional gene regula- 
tion has been recognized. Mice carrying targeted muta- 
tions in microRNAs are important resources for 
characterizing the biological functions of these molecules. 
Several initiatives have been launched to generate ES cell 
lines and mice with targeted mutations in microRNAs 
(10,11). MGD has added these emerging microRNA 
'knockouts 1 to the comprehensive catalog of alleles and 
phenotypes in mouse. Details for microRNA alleles 
includes the description of the mutation, links to pub- 
lished references, description of observed phenotypes if 
available and links to the International Mouse Strain 
Resource (12,13) for information on the availability of 
strains or cell lines that carry a specific microRNA 
allele. To date, 434 alleles in 284 microRNAs have been 
entered into MGD. Although many of these mutant alleles 
are available as ES cell lines, approximately 170 have been 
made into live mice. With respect to phenotype 
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SPATA16 spermatogenesis associated 16 ncbi G«n« id 83893 
Human Synonyms: NYD-SP12, SPGF6 

Human Chr3: 172607147-172859058 bp, - strand Rif«r«n« GRCh37.p2 Pnm»ry At««mbly 
Human Diseases Associated with Human SPATA16 (1) 



lleles 
and 

phenotypes 



All alleles(2) : Targeted(2) Gene trapped(l) 



e Models Associated with Alleles of Human SPATA16 



Hum. in Disease 

Spermatoaenic Failure 6: SPGF6 



OMIM ID 

1Q253Q 



NK " Human 
ortholog 



B 



Alleles 
and 

phenotypes 



CAVl caveolin 1, caveolae protein, 22kDa ncbi Gene id 
Human Synonyms: BSCU3, CGL3, MSTP085, VIP21 
Human Chr7: 116164839-116201239 bp, + strand Ri 
Human Diseases Associated with Human CAVl (1) 



All alleles© : Targeted© 



Human Disease Models Associated with Alleles of Hu 



Human Disease 

Lipodystrophy. Congenital Generalized, Type 3: 



Homozygous targeted mutants displayed vasci) 
hyperproliferation and fibrosis, ultimately causi 
calcium calculi, kidney stones, and decreased 

Human Diseases Modeled Using Mouse Cavl (2)1 



CGL3 



OMIM ID 

612526 



Disease Models Associated with Alleles of Mouse C 



Diseases listed here are those where a mutant allele of this gene is involved 
a mouse genotype used as a model. This does not mean that mutations in 
s gene contribute to or are causative of the disease, 
finnan Disease OMIM ID 

Alzheimer Disease: AD 104300 

114480 



Breast Cancer 



NEU 

- c 


' Human 
ortholog 


BICC1 bicaudal C homoloa 1 (Drosoohila) ncbi Gene id 80114 
Human Synonyms: BICC, CYSRD 

Human Chrl0:60272904-60588845 bp, + Strand Reference GRCh37.p2 Primary Assembly 




Alleles 


All alleles(8) : Targeted(2) Spontaneous(l) Chemicall|l||Km( 


Disease Models Associated with Alleles of Mo 








ana 

phenotypes 


Homozygous mutant mice display polycystic kic 
die either postnatally or shortly after weaning. 

Human Diseases Modeled Using Mouse Biccl (3^ 


Diseases listed here are those where a mutant allele of this gene is involved 
in a mouse genotype used as a model. This does not mean that mutations in 
this gene contribute to or are causative of the disease. 
Njiiii.iii Disease OMIM ID 
^eterotaxv. Visceral. 5, Autosomal: HTX5 270100 




Polycystic Kidnev Disease 1: PKDl 173900 
Polycystic Kidnev Disease. Autosomal Recessive; ARPKD 263200 



Figure 1. Screenshots showing the new Human Ortholog and Phenotypic Alleles sections of the MGD Gene Detail page. (A) The SPATA16 gene in 
humans is associated with a human disease entry in the Online Mendelian Inheritance in Man database, whereas alleles of the orthologous mouse 
gene have yet to be phenotyped to determine if mutations in the mouse gene result in a good model of the human disease. (B) The CAVl gene in 
humans is associated with congenital lipodystrophy, but the available mouse genotypes have not been associated with this disease. Phenotypic alleles 
of the Cavl gene in mouse have been reported for models of human diseases (breast cancer and Alzheimer's disease) that are not associated with the 
human ortholog. (C) The BICC1 gene in human has not yet been reported as being associated with any human disease, whereas the mouse ortholog 
has been reported to model several human diseases. 



annotations, 67 of the micro RN A knockout mice in MGD 
have phenotype annotations, 5 have no abnormal pheno- 
type and 98 have yet to be phenotyped. As reports appear 
in the published literature or through large-scale mouse 
phenotyping projects, the annotations for microRNA 
knockouts will be updated. 

Phenotype videos 

MGD has regularly included still images that illustrate 
mouse phenotypes associated with alleles and genotypes. 
Brief video clips of mouse phenotypes have been added 
recently to provide a new dimension of information on the 
phenotypic consequences of genomic variants. The over 
340 phenotype videos available in MGD are presented 
as YouTube® clips embedded in the web pages. These 
videos were generated by the National Heart Lung and 
Blood Institute's Bench-to-Bassinet program within the 
Cardiovascular Development Consortium. The imaging 
modalities represented include Episcope Fluorescence 
Image Capture (EFIC) image stacks, video microscopy, 
ultrasound imaging and micro-CT scans. If phenotype 



images or videos for alleles of a specific gene are available, 
a direct link to the images can be found in the Alleles and 
Phenotypes section of the Gene Detail pages in MGD and 
on the Phenotype Detail pages for specific mutant alleles. 
Figure 2 shows a link to the 15 phenotype images 
associated with alleles of the bicaudal C homolog 1 
(Biccl; MGI: 1 933 388) gene; one of the available 
images for the Biccl h2h222C% ' allele is a 2D serial EFIC 
image stack of the heart in coronal view. Investigators 
can submit phenotype videos for either existing or new 
alleles reported in MGD by following the Submit Data 
link on the MGI home page and following the instructions 
for data file submissions. 

Access to genotype and phenotype data for mouse QTL 

MGD staff curate published reports of quantitative trait 
locus-mapping experiments and, where possible, translates 
the mapping data into genome coordinates so that regions 
of the genome associated with mapped phenotypes can be 
displayed in a genome context. Reciprocal links have been 
established between quantitative trait loci (QTL) records 
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in MGD records in the QTL Archive (http://www. 
qtlarchive.org/). The QTL Archive extends the utility of 
mapped phenotypes in MGD by providing researchers 
with the access to underlying genotype and phenotype 
data used to map a QTL. Of the 4715 QTL marker 
records in MGD, over 750 have data available in the 
QTL Archive. 

Improvements in GO annotation completeness and 
visualization 

MGD is one of the founding members of the Gene 
Ontology Consortium (GOC) (14,15) and provides 
major contributions to the development of the GO 
ontologies and to developing GO community standards 
for curation of the scientific literature. MGD project 
curators are responsible for annotating mouse genes and 
gene products to GO ontology terms. 

Improvements in the GO knowledge representation and 
annotation procedures are incorporated into MGD func- 
tional annotation workflows as they are developed (see 
Gene Ontology Consortium (7). Updated ontologies are 
loaded into the MGI system and mouse GO annotations 
are contributed to the GOC on a weekly basis. MGD has 
the community responsibility of provided non-redundant 
set of mouse GO annotations to the research community 
through the MGI database resource and through the 



GOC annotation repository and database. The UniProt- 
Gene Ontology Annotation (GOA) project is the major 
other provider of mouse GO annotations (16). 

New visualization paradigms for displaying GO anno- 
tations have been implemented in MGD (Figure 3). The 
text-based summaries of gene/protein function previously 
displayed have been replaced by shorter summary state- 
ments obtained from NCBFs RefSeq resource (17) for 
each gene. RefSeq summaries include the source of the 
summarized information as well as the date the informa- 
tion was last updated. When RefSeq statements are not 
available for the mouse gene, statements pertaining to the 
orthologous human gene are included. Orthology asser- 
tions between mouse and human genes are taken from 
NCBFs Homologene resource (18). Previously supported 
tabular and graphical options for displaying GO annota- 
tions are still supported in MGD. 



DATA SUBMISSION 

Most of the data in MGD comes from semi-automated 
curation of the peer-reviewed scientific literature and from 
collaborative/cooperative arrangements with large, 
mouse-related data centers and repositories and other in- 
formatics resources. MGD also supports electronic data 
contributions directly from individual researchers. 



? Gene Ontology Classifications 

Symbol I Kitl 
Name kit ligand 
10 MGI:96974 



Go Annotations as Summary Text (Tabular View) (GO Graph) 

GO curators for mouse genes nave assigned the following annotations to the gene product of Kitl. (This text reflects annotations as of Saturday, May 26, 2012.) 
Summary from NCBX RefSeq 

[Summary is not available for the mouse gene. This summary is for the human orthotog.] This gene encodes the ligand of the tyrosine-kinase receptor encoded by the KIT 
locus. This ligand is a pieiotropic factor that acts in utero in germ cell and neural cell development, and hematopotesis, all believed to reflect a role in cell migration. In 
adults, it functions pleiotropically, while mostly noted for its continued requirement In hematopotesis. Two transcript variants encoding different Isoforms have been found 
for this gene, [provided by RefSeq, Jul 2008] 

Summary text based on GO annotations supported by experimental evidence In mouse 

• Researchers have inferred from direct assay, that the gene product of Kitl 

o participates In the following biological processes: 

■ negative regulation of mast cell apoptotlc process 

■ neural crest cell migration tiiJ 

■ positive regulation of Ras protein signal transduction I* 1 

■ positive regulation of cell proliferation & J£ ] 

■ positive regulation of mast cell proliferation ^ 

■ positive regulation of melanocyte differentiation 143 

• positive regulation of peptidyl-tyrosine phosphorylation 
o performs the following molecular functions: 

■ cytokine activity im 

■ stem cell factor receptor binding l ^ 

o is located m the following cellular components: 

■ extracellular space [2 - UI 

■ integral to membrane 111 

■ plasma membrane ™ 

• The gene product of Kitl has been shown to bind to the gene products of Kit. 131 

- ^ — baue lajaaMd haxaad *»- ahaMfcaejaalc ■«»i.«i- ~« muta^ta H"> «r w.»i 

Figure 3. Screenshot showing the new Gene Ontology annotation display in MGD. 
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Any type of data that MGD maintains can be submitted 
as an electronic contribution. Other common types of sub- 
mission include mutant and QTL-mapping data. Each 
electronic submission receives a permanent database ac- 
cession ID. All data sets are associated with their source, 
either a publication or an electronic submission reference. 
MGD reference pages provide links to associated data 
sets. On-line information about data submission proced- 
ures is found at the URL: http://www.informatics.jax.org/ 
submit. shtml. 



COMMUNITY OUTREACH AND USER SUPPORT 

MGD provides extensive user support through on-line 
documentation, email and phone access to User Support 
Staff. 

User Support can be accessed by: 

• World wide web: http://www.informatics.jax.org/ 
mgihome /homepages/help . shtml; 

• Email access: mgi-help@informatics.jax.org; 

• Telephone access: +1 207 288 6445; and 

• FAX access: +1 207 288 6132. 

Additional outreach and support are provided by a 
moderated email bulletin board, MGI-LIST (http://www. 
informatics.jax.org/mgihome/lists/lists.shtml). MGI-LIST 
is managed by the MGI User Support team and has over 
2000 subscribers and an average of 75 posts/discussions per 
month. 



SYSTEM OVERVIEW 

The software, database and hardware components 
comprising MGD are organized into a front end, where 
the data are made available to the public and a back end, 
where data are loaded/curated/integrated. Most of the 
components that were previously supported by a Sybase 
(http://www.sybase.com) relational database management 
system have been replaced with a combination of 
PostgreSQL (http://www.postgresql.org) and Solr/ 
Lucene indexes (http://lucene.apache.org/solr). Solr is an 
enterprise search server built on the Lucene text searching 
library. It provides powerful and fast text searching via an 
applications programming interface (API) over the web 
(via HTTP). Components maintained outside of the 
main MGD system include BLAST-able databases and 
genome assemblies, the databases that support Mouse 
GBrowse (19) resource and the MGI BioMart (20,21) 
instance. 

There are two primary means by which data are entered 
into MGD: the editing interface (EI) and automated load 
programs. The EI is an interactive and graphical applica- 
tion. Curators use the EI to enter new data from the 
literature, to verify the results of automated loads and to 
correct errors. The automated load programs integrate 
larger data sets from many sources into the database. 
Automated loads involve quality control checks and 
processing algorithms that integrate the bulk of the 
data automatically and identify issues to be resolved by 
curators or the data provider. Through these two vehicles, 



the EI and automated loads, MGD is able to scale and 
adapt as new data sources for the mouse are made 
available. 

Access to information in MGD is provided in several 
ways to support our diverse community of users including 
the web interface, Batch Query tool, FTP and a web 
services API. 

Web interface 

Interactive web-based interfaces are the primary means of 
access to MGD. The keyword based 'Quick Search' 
option on the MGI home page is the most commonly 
used search tool for single concept searches. The Batch 
Query tool (19) is a component of the MGD web interface 
that enables searches by lists of genes. It can be used as an 
accession ID translator (e.g. to convert a list of MGI IDs 
to the corresponding list of EntrezGene IDs) or as a way 
to retrieve a set of information for a collection of genes/ 
features (e.g. to obtain all the GO annotations for a list of 
genes). In either case, the input is a user-specified collec- 
tion of IDs in a text field or file upload. Gene symbols and 
a wide variety of ID types are accepted, including IDs 
from MGI, EntrezGene, Ensembl, Havana/Vega, 
GenBank, RefSeq, UniProt, RefSNP, Affymetrix, GO, 
etc. Users also select their desired output, including anno- 
tations from GO, MP, OMIM; phenotypic alleles; gene 
expression results (from GXD); or any of the above ID 
types. The Batch Query maps each input ID to any cor- 
responding genes/features in MGD (may be more than 
one) and returns them with the requested data. The 
Batch Query is fully integrated with the MGD web inter- 
face and is called from various pages to generate a user 
customizable gene/feature summary. Results are available 
as HTML, tab delimited or Excel format. 

Other web interfaces to MGD include MouseBLAST 
for sequence similarity searches against a variety of 
rodent-relevant sequence databases, Mouse GBrowse for 
genome centric browsing and MGI's BioMart for searches 
that combine results from MGI and Ensembl. 

MGD's public FTP reports include over 50 flat file 
reports that are generated weekly. Most external inform- 
atics resources that incorporate data from MGD obtain 
their data from these reports. Custom reports are created 
upon request. 

The MGI web services API is a Simple Object Access 
Protocol-based interface to the database providing pro- 
grammatic access with identical functionality as the 
Batch Query tool described above. 



CITING MGD 

For a general citation of the MGD resource, researchers 
should cite this article. In addition, the following citation 
format is suggested when referring to data sets specific to 
the MGD component of MGI: MGD, MGI, The Jackson 
Laboratory, Bar Harbor, Maine (URL: http://www.in 
formatics.jax.org). [Type in date (month, year) when you 
retrieved the data cited.] 
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